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ABSTRACT 

Escherichia coli can rapidly switch to the metabol- 
ism of L-arabinose and D-xylose in the absence of its 
preferred carbon source, glucose, in a process 
called carbon catabolite repression. Transcription 
of the genes required for L-arabinose and D-xylose 
consumption is regulated by the sugar-responsive 
transcription factors, AraC and XylR. E. coli repre- 
sents a promising candidate for biofuel production 
through the metabolism of hemicellulose, which 
is composed of D-xylose and L-arabinose. 
Understanding the L-arabinose/D-xylose regulatory 
network is key for such biocatalyst development. 
Unlike AraC, which is a well-studied protein, little 
is known about XylR. To gain insight into XylR 
function, we performed biochemical and structural 
studies. XylR contains a C-terminal AraC-like 
domain. However, its N-terminal D-xylose-binding 
domain contains a periplasmic-binding protein 
(PBP) fold with structural homology to Lacl/GaIR 
transcription regulators. Like Lacl/GaIR proteins, 
the XylR PBP domain mediates dimerization. 
However, unlike Lacl/GaIR proteins, which 
dimerize in a parallel, side-to-side manner, XylR 
PBP dimers are antiparallel. Strikingly, D-xylose 
binding to this domain results in a helix to strand 
transition at the dimer interface that reorients both 
DNA-binding domains, allowing them to bind and 
loop distant operator sites. Thus, the combined 
data reveal the ligand-induced activation mechan- 
ism of a new family of DNA-binding proteins. 



INTRODUCTION 

Bacteria rapidly switch to the use of the most accessible 
energy source by inhibiting the synthesis of proteins 
involved in the catabolism of unavailable carbon metab- 
olites. This preferential pattern of sugar metabolism 
has been termed carbon catabolite repression (CCR) or 
diauxie (1). In the absence of adequate stores of the 
preferred carbon source, glucose, Escherichia coli can 
rapidly change to the metabolism of L-arabinose and 
D-xylose. These sugars are transported into E. coli by 
the transporters AraFGH and XylFGH, respectively, 
and metabolized by similar gene clusters encoding 
isomerases and kinases (araBA and xylAB) (1-5). 
Transcription of the genes necessary for the consumption 
of each sugar is regulated by a sugar-responsive transcrip- 
tion factor: AraC regulates arabinose-responsive operons 
and XylR activates D-xylose-responsive genes (3,4). 
Interestingly, recent data have shown that AraC binds to 
both the L-arabinose and D-xylose responsive promoters 
and acts as an activator in the former and repressor in the 
latter (4,5). As a result, L-arabinose is metabolized before 
D-xylose. 

E. coli has potential as a biocatalyst for production 
of biofuels because it can metabolize all sugars in plant 
materials. The two most abundant sugars in ligno- 
cellulosic sources are glucose and D-xylose. E. coli 
mutants with a constitutively active cAMP receptor 
protein (CRP) are able to simultaneously consume 
glucose and D-xylose (6). With the broader goal of 
generating an E. coli biocatalyst that can co-metabolize 
all biomass sugars, it would be necessary to also eliminate 
the diauxie between D-xylose and L-arabinose, as these two 
sugars comprise 95% of the total sugar hemicellulose 
(6,7). Indeed, the fact that E. coli consumes D-xylose 
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only after it consumes L-arabinose, prevents it from affect- 
ing a complete bioconversion of sugar mixtures into fuels. 
Notably, recent studies showed that this hierarchy could 
be disrupted, allowing for the equal consumption of the 
both L-arabinose and D-xylose, if the intracellular levels of 
XylR were increased (5). The resultant engineered bacteria 
were able to produce 36% more ethanol compared with 
wild-type E. coli. Thus, understanding how XylR func- 
tions at the atomic level would not only provide insight 
into CCR but also ways to alter the efficacy of XylR as a 
transcription activator, which may lead to the develop- 
ment of an improved E. coli biocatalyst. 

Unlike AraC, which is a well-studied protein, little is 
known about the XylR protein. Data show that XylR 
activates transcription by binding to a 37 bp consensus 
DNA operator sites found in the promoters of the genes 
it regulates (2,3). AraC inhibits metabolism of D-xylose by 
binding the same promoter sites. XylR is a 392 amino acid 
protein, and residues 304-392 show weak similarity to the 
DNA-binding domain of AraC proteins (8-15). However, 
its N-terminal domain is nearly twice the size of most 
AraC proteins and shows no homology to any charac- 
terized protein. The AraC family of transcriptional regu- 
lators is defined by a 100-residue region of sequence 
similarity that forms an independently folding 
DNA-binding domain composed of two helix-turn-helix 
(HTH) motifs (8). Members fall into three functional 
groups depending on the types of genes that they 
regulate. Those that regulate carbon metabolism, such as 
E. coli AraC, are active as dimers and respond to small 
molecule effectors that bind to the protein N-terminal 
domain. Those members that are involved in stress re- 
sponses, such as SoxS, Rob and MarA typically function 
as monomers. The third group is involved in regulating 
virulence gene expression and includes the Vibrio cholera 
ToxT protein. Structures of the DNA-binding domains of 
AraC, Rob and MarA have been determined, including 
structures of the Rob and MarA domains in complex 
with cognate DNA (9-12). In the MarA-DNA structure, 
the recognition helices of each motif inserts into adjacent 
major grooves on the same face of the DNA, making base- 
specific contacts (9). The tandem arrangement of two 
helix-turn-helix DNA-binding domains allows for the 
recognition of ~20bp. 

Dimerization by AraC proteins further enhances DNA- 
binding specificity, as it permits the insertion of four 
helix-turn-helix motifs onto the DNA. The N-terminal 
domain of AraC binds L-arabinose and also functions in 
dimerization. This domain is flexibly attached to its 
C-terminal DNA-binding domain. Structures of the 
N-terminal domains have been obtained for AraC, 
E. coli Rob and V. cholerae ToxT (11-14). The AraC 
N-terminal domain contains a flexible N-terminal arm 
connected to an eight-stranded antiparallel B-barrel. 
L-arabinose binds in a pocket within the P-barrel. The 
N-terminal domains of ToxT and Rob are structurally 
similar to the AraC L-arabinose-binding domain. Indeed, 
all contain an eight-stranded antiparallel (3-sheet. In ToxT, 
this domain binds fatty acids, whereas its function in Rob 
is as yet unknown. Surprisingly, despite the wealth of data 
on AraC proteins, the molecular details by which the 



signal of ligand binding to the N-terminal domains of 
these proteins is communicated to their DNA-binding 
regions to effect transcription regulation are still unclear. 
To gain insight into the function of the atypical AraC 
protein, XylR, we performed biochemical and structural 
studies. These combined studies reveal a new structural 
family of DNA-binding proteins and also how ligand 
binding is communicated from a separate N-terminal 
ligand-binding domain to an AraC-like DNA-binding 
domain to activate it for DNA binding. These data also 
provide structural insight that may aid in the development 
of more efficient biocatalysts. 

MATERIALS AND METHODS 

Purification of E. coli XylR 

The xylR gene was amplified from DH5a genomic DNA 
by polymerase chain reaction (PCR) and cloned into the 
pET28a expression vector such that a C-terminal his-tag 
was added for purification purposes. BL21(DE3) compe- 
tent cells were transformed with the x>7i?pET28a vector, 
and the resultant protein was expressed by inducing with 
0.5 mM isopropyl-P-D-thiogalactoside (IPTG) for 16hrs 
at 15°C. Cells were lysed in a buffer consisting of 20 mM 
Tris pH 8.0, 300 mM NaCl and 10 mM imidazole by a 
microfluidizer, and the lysate was loaded onto a 
Ni-NTA column. After extensive washing, XylR protein 
was eluted using 20 mM Tris pH 8.0, 300 mM NaCl, 
300 mM imidazole and then dialysed into 20 mM Tris 
pH 8.0, 300 mM NaCl. 

Crystallization and data collection of XlyR crystals 

Apo XylR crystals were grown by mixing the protein at 
a concentration of 3mg/ml with 25% PEG 3350, 0.1 M 
Tris pH 8.0 at a protein to drop ratio of 1:1. The crystals 
grew to maximum size in 2 weeks. The crystals were 
cryo-protected by dipping them into a solution consisting 
of the crystallization solution supplemented with 25% 
glycerol for 2-5 sec followed by direct placement in the 
liquid nitrogen stream. The crystals are trigonal, space 
group P3 2 21 with a = b = 124.5 A and c = 189.8 A and 
diffracted to 3.4 A at synchrotron sources. Data were suc- 
cessfully collected on only one crystal; the o crystals were 
fragile and typically diffracted to only 5.0 A. The crystal 
contains three subunits in the crystallographic asymmetric 
unit (ASU); two subunits form a dimer and the third 
subunit forms a dimer with itself via crystallographic 
symmetry. Crystals of the XylR-D-xylose complex were 
obtained via hanging drop vapor diffusion using 
100 mM Tris pH 8.0, 1.3 M Li 2 S0 4 , 50 mM NaSCN, 
4mM 1,4-Dithio-DL-threitol (DTT) and 4mM D-Xylose 
as a crystallization condition. The crystals took several 
days to grow and reached their maximum size in a week. 
The crystals were cryo-preserved by a quick dip (< 1 sec) in 
a solution containing the crystallization reagent supple- 
mented with 25% sucrose and maintaining the D-xylose 
concentration at 4mM. These crystals were tetragonal, 
space group P4 2 2i2 with a = b = 70.0A and 
c = 215.4 A, contain 1 subunit in the ASU. X-ray intensity 
data were collected at 100 K at beamline 8.3.1 at 
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Advanced Light Source (ALS) in Berkeley. Data were 
integrated with MOSFLM, merged and scaled with 
SCALA in CCP4 (16). 

Structure determination and refinement of apo XylR and 
the XylR-D-xylose complex 

The structure of XylR-D-xylose complex was solved 
to 2.90 A by multi-wavelength anomalous diffraction 
(MAD) using crystals grown with selenomethionine- 
substituted protein. Selenomethionine-substituted XylR 
was produced using the methionine inhibitory pathway 
method. The selenomethionine-substituted protein was 
purified as for wild type, with the exception that 5mM 
(3-mercapto-ethanol (P-ME) was included in all buffers. 
MAD data were collected on these crystals, and the 
selenium sites located with SOLVE, resulting in a figure 
of merit of 0.67 (17). The phases were improved by density 
modification using the program RESOLVE. The structure 
was then manually built using Coot (18) and refined in 
REFMAC5 (19). The final XylR-D-xylose structure 
includes residues 1-46, 55-389, 1 D-xylose molecule and 
53 solvent molecules and has R wor k/R&ee values of 
21.9%/27.9% to 2.90 A resolution. The apo XylR struc- 
ture was solved by molecular replacement (using MolRep) 
with the XylR structure (minus the D-xylose). This struc- 
ture contains residues 1^46, 55-389 of each of the three 
subunits and was minimally refined in CNS to R WO rk/Rfi-ee 
values of 28.9%/31.6% to 3.40 A resolution (20). Selected 
data collection and refinement statistics are given in 
Table 1. 



Atomic force microscopy (AMF) sample preparation and 
imaging 

The 371 bp DNA construct (xyl Promoter) used encom- 
passes the xyl promoter and spans the region between the 
start codons of xyl A and xylF (21,22). This region 
contains two outwardly directed promoters, 1A and IF. 
The xyl Promoter DNA construct was amplified from 
E. coli MG1655 genomic DNA with the following 
primers: 5' - AT ATTG A ACTCC AT A ATC AGGT A ATG 
C-3' (forward) and 5'-CATGGTGTAGGGCCTTCT 
GT-3' (reverse). The 900 bp DNA construct (1FIF900) 
encodes two IF promoter sites separated by 500 bp with 
200 bp flanking each termini. The latter construct was 
used to clearly visualize DNA looping. AFM samples 
were prepared with 20 uM of XylR-DNA, with a ratio 
of 2:1 (protein:DNA), in binding buffer (75 mM NaCl, 
20 mM Tris-HCl pH 7.5 and 2mM Xylose). For sample 
deposition, specially modified l-(3-aminopropyl)silatrane 
(APS) mica surface was used. The APS mica was obtained 
by incubation of freshly cleaved mica in 167nM 1- 
(3-aminopropyl)silatrane. The details of APS mica 
surface modification are described in (23,24). The sample 
droplet (5-10 ul) was deposited on APS mica and after 2- 
min incubation, sample excess was washed with deionized 
water (AquaMax Tm Ultra, LabWater.com) and dried with 
an Argon gas flow. AFM images in air were acquired 
using MultiMode AFM NanoScope IV system (Veeco/ 
Bruker Iinstruments, Santa Barbara, CA, USA) operating 
in tapping mode. Regular tapping Mode Silicon Probes 
(Olympus from Asylum Research, Santa Barbara, CA, 
USA) with a spring constant of 42N/m and a resonant 
frequency between 300-320 kHz were used. 



Table 1. Selected crystallographic 


data for XylR structures 






Selenomethionine MAD data for 


XylR-D-xylose 






Energy (keV) 


12678.8/peak 


12676.8/inflection 


12979.6/remote 


Resolution (A) 


107.83-2.90 


107.83-2.90 


107.83-2.90 


Overall R sym (%) a 


7.5 (36.6) b 


7.5 (36.2) 


7.9 (38.5) 


Overall I/a(I) 


30.2 (8.0) 


29.9 (8.0) 


29.2 (7.6) 


#Total reflections 


196 548 


196 226 


196403 


#Unique reflections 


12701 


12 729 


12 704 


Multiplicity 


8.6 


8.5 


8.6 


Overall Figure of Merit c 






0.67 


Refinement statistics 








Structure/pdb ID code 




XylR-D-xylose/4FE7 


apo XylR/4FE4 


Resolution (A) 




66.67-2.90 


107.83-3.40 


Overall R sym (%) a 




7.9 (38.5) 


11.7 (49.6) 


Overall I/a(I) 




29.2 (7.6) 


7.0 (1.7) 


#Total reflections 




196403 


22188 


#Unique reflections 




12 022 


2286 


% complete 




100 (100) 


97.0 (98.5) 


Rwork/Rfree(%) 




21.9/27.9 


28.9/31.6 


Rmsd 








Bond angles ( a \ 




1.275 


1.60 


Bond lengths (A) 




0.010 


0.011 


Ramachandran analysis 








Most favoured (%) 




89.4 


78.9 


Additional allowed (%) 




10 


20.2 


Generously allowed (%) 




0.6 


0.9 


Disallowed (%) 




0.0 


0.0 



"Riym = ££|Ihkl— Ihkl(j)|/EIhkl, where Ihkl(j) is observed intensity and Ihkl is the final average value of intensity. 
b Values in parentheses are for the highest resolution shell. 

""'Figure of Merit = <|EP(a)e' ct /EP(ot)|>, where ot is the phase and P(cx) is the phase probability distribution. 

d Rwork = S||F obs | - |F calc ||/E|F obs | and R free = S||F obs | - |F C!l i c ||/E|F obs |; where all reflections belong to a test set of 5% 

randomly selected data. 
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Fluorescence Polarization assays 

Fluorescence Polarization assays of XylR-DNA binding 
were performed as described (25). All oligonucleotides 
used in the assays were 5'-fluorescein labelled. In each 
assay, 1 nM oligonucleotide was added to the binding 
buffer (75mM NaCl, 20mM Tris pH 7.5, 2ug/ml 
poly[dI-dC], ± 2mM D-xylose), and increasing concentra- 
tions of protein were titrated into the binding mixture. 
The excitation and emission wavelengths were 490 nm 
and 530 nm, respectively. The data were fit using 
Kaleidagraph as previously described (25). 

Isothermal titration calorimetry (ITC) experiments 

All ITC experiments were performed using a VP-ITC 
system (MicroCal Inc., Northampton, MA, USA). The 
ITC experiments were performed with either D-xylose or 
L-arabinose (in the syringe) at the concentration of 1 mM 
and lOOuM of XylR (in the sample cell). Ligand was 
titrated into the sample cell containing XylR at 25°C, 
and the resulting isotherm was fitted with Origin. The 
XylR sample was placed into the ITC buffer (75 mM 
NaCl, 20 mM Tris-HCl pH 7.5) via dialysis, and the two 
ligands sample were dissolved in the dialysis buffer. 

RESULTS AND DISCUSSION 

Overall structure of E. coli apo XylR and the 
XylR-D-xylose complex 

To gain insight into the molecular functions of XylR, we 
determined structures of apo XylR and a XylR-D-xylose 
complex. The XylR-D-xylose complex structure was 
solved by MAD to 2.90 A and refined to R WO rk/Rfree 
of 21.9%/27.9%. The apo XylR structure was solved by 
molecular replacement using the XylR structure from the 
XylR-D-xylose complex as a search model ('Materials and 
Methods' section). The final apo XylR model was refined 
to Rwork/Rfree of 28.9%/31.6% to 3.40A resolution 
(Table 1; 'Materials and Methods' section). The structures 
show that XylR is composed of two domains, an 
N-terminal domain (residues 1-274) and a C-terminal 
domain (residues 285-389). The N-terminal domain has 
a a— P fold, whereas the C-terminal domain is all helical 
and composed of 7 helices. The two domains are con- 
nected by a long, but structured, linker formed by 
residues 275-284 (Figure 1A and B). As predicted, the 
C-terminal domain shows structural similarity to AraC 
proteins. This domain of XylR shows the strongest struc- 
tural homology with the corresponding domain of the 
MarA protein, with the Ca superimposition resulting in 
a root mean squared deviations (rmsds) of 2.1 A. 
However, the N-terminal domain of XylR is distinct 
from the p-barrel like ligand-binding domain in other 
AraC proteins (11-13). Structural homology searches 
revealed that this domain shows the strongest similarity 
to the periplasmic-binding protein (PBP) domains of the 
LacI/GalR family of transcription regulator (27-33). In 
particular, the XylR N-terminal domain is the most 
similar to the PBP domain of the Purine Repressor, 
PurR (27,28). One subunit of XylR can be superimposed 



onto a PurR subunit with an rmsd of 2.5 A for 212 cor- 
responding Ca atoms (Supplementary Figure SI). 

The E. coli XylR structure combines two domains pre- 
viously not found in the same protein, an N-terminal 
D-xylose-binding domain that is homologous to those of 
the LacI/GalR family and a C-terminal DNA-binding 
domain with an AraC-like DNA-binding fold. Thus, the 
XylR structure defines a new family of DNA-binding 
proteins. BLAST searches, which revealed > 100 proteins 
with strong sequence homology to E. coli XylR, suggest 
that this 'XylR family', which consists of an N-terminal 
effector binding domain with a PBP fold connected to 
an AraC-like DNA-binding domain, is wide spread in 
Gram-negative bacteria. The Caulobacter crescentus 
XylR protein, however, represents an exception. Recent 
sequence homology analyses predict that this protein is a 
bone fide member of the LacI/GalR proteins, with an 
N-terminal HTH domain followed by a hinge helix and 
C-terminal PBP-like domain (34). These findings suggest 
a possible role in domain swapping during the evolution of 
XylR proteins. Like LacI/GalR and periplasmic-binding 
proteins, the N-terminal region of XylR is composed of 
two a— p subdomains (herein called subdomains 1 and 2), 
which are connected by short crossover regions that, in the 
PBPs and LacI proteins, permit rotation between sub- 
domains (Figure 1A). This subdomain movement allows 
the protein to trap a ligand once it has entered the 
binding cavity. The resulting subdomain movement can 
be transmitted to attached regions to elicit other effects, 
such as conformational changes or folding of attached 
domains. 

XylR contains a periplasmic binding fold used in 
antiparallel dimerization 

Examination of the packing of the XylR crystal structures 
suggests that, like the LacI/GalR proteins, XylR dimerizes 
via its PBP-like domain (Figure 1C and D). The XylR 
dimer interface buries an extensive 2960 A 2 of protein 
surface from solvent. Size exclusion chromatography ex- 
periments revealed molecular weights consistent with a 
XylR dimer (Supplementary Figure S2). Like LacI/GalR 
proteins, the dimerization interface of XylR is formed pri- 
marily by interactions between the PBP domains. 
However, in sharp contrast to the LacI/GalR proteins, 
which dimerize via parallel interactions between PBP 
regions, the XylR PBP domains interact in an antiparallel 
fashion. Also distinct from LacI/GalR oligomers, the PBP 
folds of XylR also make extensive interactions with the 
C-terminal DNA-binding domain of the other subunit 
in the XylR dimer (Figure 1C and D). These contacts 
are made between the DNA-binding domain and 
subdomain 1 from the other subunit. 

The only other PBP-containing proteins that use a 
form of antiparallel dimerization are the LysR family of 
transcription regulators (35). LysR proteins contain an 
N-terminal winged helix DNA-binding domain, which is 
unrelated to the DNA-binding domains of LacI/GalR 
proteins or XylR. Although both PBP domains of LysR 
proteins and XylR both dimerize in an antiparallel mode, 
the subunit structure/PBP folds of these proteins are 
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D-xylose binding surface 




Figure 1, Structure of E. coli XylR defines a new DNA-binding family. (A) Overall structure of a XylR subunit. a-helices and (3-strands are coloured 
red and yellow, respectively, and labelled, and loops are coloured green. The bound D-xylose is shown as cpk, with carbons and oxygens coloured 
cyan and red, respectively. The DNA-binding domain and PBP subdomains 1 and 2 are labelled. (B) Topology diagram of the XylR subunit with 
a-helices and fj-strands coloured as in Figure 1A. The residues contained within each secondary structural element are also indicated. The asterisk 
indicates the region, which encompasses residues 221-229, which is a helix in the apo form and a strand in the D-xylose bound form (the latter of 
which is shown here). (C) Two views of the XylR dimer (rotated by 90°). One subunit is coloured as in Figure 1A and other is coloured dark blue. 
(D) Electrostatic surface representation of the XylR dimer (shown in the same orientations as Figure 1C). Electropositive and electronegative regions 
are coloured blue and red, respectively. This Figure, Figures 2A-B, 3A-B and 4F were made with PyMOL (26). 



significantly different; the PBPs of XylR and LysR proteins 
superimpose with rmsds of 4.6-5.5 A (35). Also, XylR di- 
merization involves interactions between its DNA-binding 
domain and subdomain 1 of the other PBP subunit in its 
dimer, which is not observed in LysR proteins. The XylR 
antiparallel PBP dimerization mode results in an arrange- 
ment in which the DNA-binding domains are found on 
opposite ends of the dimer. The long linker allows for 
the formation of an oligomer with two faces, one contain- 
ing the DNA-binding domains and the other, the PBP anti- 
parallel dimer (Figure 1C and D). This arrangement leaves 



both the DNA-binding and ligand-binding domains unob- 
structed such that each domain can bind its ligand without 
impediment from the other domain. 

D-xylose binding to XylR 

To ascertain how D-xylose affects XylR function, we 
co-crystallized XylR in the presence of D-xylose. Clear 
density was observed for a D-xylose molecule in the 
pocket between the subdomains of the XylR PBP 
domain (Supplementary Figure S3). Stacking interactions 
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to the D-xylose molecule are provided by Tyrl8 from 
subdomain 1 and Trpl35 from subdomain 2 (Figure 2A 
and B). Every oxygen moiety of the D-xylose sugar is con- 
tacted by one, or several XylR residues, with the exception 
of the xylose 05 atom. Asp65 from subdomain 1 contacts 
the D-xylose 04 hydroxyl and the remaining hydrogen 
bonds are provided by subdomain 2 residues. Gln237 
contacts the 03 hydroxyl, the 05 atoms of Asp219 
hydrogen bonds with the Ol and 02 hydroxyls, while 
both Ne atoms of Argl39 contact the 02 and 03 hy- 
droxyls. Previous studies on periplasmic-binding proteins 
and LacI/GalR members have shown that ligand binding 
captures or stabilizes the closed conformation of the PBP 
clamshell fold. Indeed, D-xylose binding requires the 
specific arrangement of residues only found in the closed 
state. It is also interesting to note that XylR residue 
Gln237 is located on one of the three cross-overs that 
connect the subdomains. Thus, this Gln237-D-xylose 
contact likely also stabilizes the closed state. 

In vivo studies have shown that XylR does not respond 
to L-arabinose despite its structural similarity to D-xylose. 



Consistent with this, modelling of L-arabinose in the XylR 
binding pocket revealed significant clash between the 
L-arabinose 04 hydroxyl and the XylR Trpl35 side 
chain (04 to Trpl35 Cs2 distance of 2.1 A) (Figure 2B). 
However, these modelling exercises were carried out 
assuming that L-arabinose would bind in the same orien- 
tation in the pocket as D-xylose (see Figure 2B), and other 
binding modes can not be ruled out. Thus, to determine 
the binding affinities of XylR for D-xylose and 
L-arabinose, we performed ITC studies. Clear binding of 
XylR to D-xylose was observed, resulting in a K d of 
3.3 ± 0.5 uM (Figure 2C; Supplementary Figure S4). 
Consistent with the XylR-D-xylose structure, these experi- 
ments also revealed a binding stoichiometry of 1 subunit 
of XylR to 1 molecule of D-xylose. Also consistent with 
the structure, ITC experiments revealed no binding of 
L-arabinose by XylR (Figure 2C). 

Only when L-arabinose is exhausted will D-xylose bound 
XylR activate the expression of the D-xylose metabolic 
genes in the presence of hemicellulose food sources. 
Recent studies have shown, however, that this diauxie 





Figure 2. D-xylose binding by XylR. (A) Close up of the D-xylose-XylR interaction. XylR residues that make key interactions with D-xylose are 
shown as sticks and labelled. (B) Comparison of XylR-D-xylose complex with a model of a XylR-L-arabinose complex. As indicated by the 
transparent surface representations, L-arabinose binding in this mode would result in significant clash with Trpl35. (C) ITC studies on D-xylose 
(left) and L-arabinose (right) binding to XylR. The binding isotherm of XylR for D-xylose resulted in a Kj of 3.3 ± 0.5 uM with a stoichiometry of 1 
XylR: 1 D-xylose. By contrast, L-arabinose showed no binding by XylR. 
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can be altered leading to the design of a more efficient 
E. coli biocatalyst by controlled overexpression of XylR. 
This overexpression allows XylR-D-xylose to compete 
with AraC-L-arabinose for binding to xyl promoters (5). 
Additional strategies in the design of E. coli biocatalyst 
would be to engineer a XylR protein that is responsive to 
both D-xylose and L-arabinose or identify a small molecule 
that can readily cross the membrane (as high affinity 
D-xylose transporters are only highly expressed upon xyl 
transcription activation) and bind with nanomolar affinity 
to XylR. The use of such a high affinity small molecule 
ligand would also alleviate the need for a bioengineering 
step. The details of the D-xylose-binding pocket provided 
by the XylR-D-xylose structure should significantly facili- 
tate such design efforts. 

Comparison of apo XylR and XylR-D-xylose structures; 
D-xylose binding leads to a helix to strand transition 
and reorganization of the XylR dimer 

The XylR DNA-binding domain'-subdomain 1 interface 
(where prime indicates other subunit in the dimer) is 
formed by contacts between pi, P2, ocl and alO of 
subdomain 1 with helices all' and a 13' of the 
DNA-binding domain. Key hydrophobic and stacking 
interactions in this interface are formed between Phe2 
with His297' and Tyr29 and Ala32 with Met359'. In 
addition, there are numerous salt bridge and hydrogen 
bonding interactions. Specifically, Glu28 interacts with 
Thr349', Glu36 with Lys300', K248 and Tyr244 with 
Glu355' (Supplementary Figure S5). As noted, however, 
the most extensive XylR interface is composed of antipar- 
allel contacts between PBP subdomains. In particular, 
subdomain 1 residues from al and P2 interact with 
subdomain 2 residues from a6 (residues 162-170), a7 
(residues 193-205) and piO. 

To obtain insight into how D-xylose activates DNA 
binding by XylR, we compared structures of the apo 
and D-xylose bound states. The structures revealed the 
same overall dimer organization whereby the D-xylose- 
binding PBP faces are located on one side of the dimer 
and the DNA-binding domains on the other. However, 
D-xylose binding leads to significant structural changes 
as underscored by resulting rmsds of 1.5 A for superim- 
position of individual subunits and 2.2 A for overlays of 
both subunits in the dimer. Moreover, rmsds of 1.2 A are 
obtained from superimpositions of individual PBP 
domains, whereas superimpositions of entire subunits, 
including the DNA-binding domain, results in rmsds of 
1.7 A. These findings indicate that D-xylose binding causes 
structural changes in the orientation of the subunits within 
the dimer as well as the relative orientation of the ligand 
binding to the DNA-binding domain within each subunit. 

Examination of the residues near the ligand-binding 
pocket revealed the striking finding that D-xylose binding 
is accompanied by a transition in residues 221-229 from 
an a-helix in the apo form to a strand (P10) in the D-xylose 
bound state (Figure 3A-C; Supplementary Figure S6). 
This helix to strand conversion appears to be triggered 
by D-xylose interaction with residues 219-221, as 
modelling shows that the side chain of Asp219 is <1.4A 



from the D-xylose moiety in the apo state (Figure 3A). 
Hence, this residue must move to permit D-xylose 
binding. The presence of the rigid helix my impede this 
movement. As a result, in the D-xylose-bound structure 
residues 225-226 buckle out (Figure 3A and B), thus 
creating a binding pocket that permits D-xylose insertion 
(Supplementary Figures S7-S9). What is particularly 
striking about this structural transformation of residues 
221-229 is that these residues lie in the dimerization inter- 
face. Notably, Tyr226 moves from its helical position 
where it hydrogen bonds with Arg240' in the apo structure 
to its strand position in the D-xylose bound structure, 
where it interacts with Asp38' (Supplementary Figure 
S6). In addition, there is a large shift in the position of 
al of the neighbouring subunit (Figure 3A and B). This 
helix lies at the nexus between the PBP and the 
DNA-binding domain and in fact inserts between the 
two HTH repeat elements within each DNA-binding 
domain subunit (Figure 3A; Supplementary Figure S6). 
As a result even minor shifts in the position of al are 
capable of producing significant structural changes within 
the tandem helix-turn-helix containing DNA-binding 
domain. Moreover, because these changes occur in both 




Figure 3. D-xylose binding triggers helix to strand transition. 
(A) Superimposition of the apo (green) and D-xylose bound (blue) 
XylR structure showing a close up of the region undergoing a helix to 
coil transition. The overlay indicates that D-xylose triggers this response 
by forcing Asp219 and the accompanying N-terminal region of the helix 
to move, which requires the helix to unfold. (B) D-xylose binding leads to 
a helix to strand transition. For clarity the strand is shown as a thin 
ribbon in this Figure. (C) Overall result of the helix to strand transition 
upon D-xylose binding is a reorientation of the DNA-binding domains. 
D-xylose is shown as cpk, the D-xylose binding domain as transparent 
surfaces and the DNA-binding domains as ribbons. 
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subunits of the dimer, the net movement of the DNA- 
binding domains is further amplified. This explains the 
large differences noted from superimposition of the apo 
and D-xylose bound structures. A notable result of these 
large conformational changes is an increase in the overall 
buried surface area of the dimer (the D-xylose bound dimer 
buries 2960 A 2 compared with 2446 A 2 for the apo bound 
form). Previous work indicated that D-xylose binding acti- 
vates XylR for DNA binding (2-5). Hence, D-xylose- 
induced conformational changes presumably align the 
HTH elements properly for DNA binding; however, a 
complete understanding of the D-xylose-induced 
DNA-binding mode will require a XylR-D-xylose-DNA 
structure. 



D-xylose-induced conformational changes and DNA 
binding stoichiometry 

XylR regulates two co-transcribed operons by binding the 
~37 bp sites, IA and IF. These sites are each composed of 
two direct repeats (with consensus- --gaAa-a- -a- AAT— 
gaAa-a— a-AAT) (2,3). These binding sites each control a 
separate operon, one controls the xylAB cluster and the 
other, the xylFGH genes (2,3). Interestingly, these two 
gene clusters, which are separated by ~360bp, are 
transcribed in opposite directions (Figure 4A). In each 
promoter, the XylR-binding sites are located next to the 
—35 motif that specifies the a70 subunit of RNA polymer- 
ase. How XylR regulates these two operons is unclear. 




IA Promoter sequence 
-CAATTATGTTATTTCACACTGCTATTGAGATAATTCACAAGT J ! f? >. 



IF Promoter sequence 

CAAGAAATAAACCAAAAATCGTAATCGAAAGATAAAAATCTG-3 ' 



FP of XylR-IA Promoter site 
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Figure 4. XylR DNA binding and activation by D-xylose. (A) Top shows the organization of the two operons regulated by XylR. Below, sequences 
of the xyl promoters (IA and IF), which are transcribed in opposite directions. The arrows represent the 5' to 3' directions of the sequence motifs. 
(B) The binding affinity of XylR for the IA promoter. The resulting isotherm revealed a binding affinity of ~33 nM, in the presence of D-xylose. In 
the absence of D-xylose, no significant binding is observed. (C) The XylR-IF promoter binding isotherm reveals an affinity of ~25 nM in the presence 
of D-xylose, whereas in the absence of D-Xylose no binding is observed (monomer concentration). (D) Stoichiometric FP experiment carried out in 
the same manner as that shown in (C) with the exception that the IF DNA concentration was increased to 1 uM. This concentration is 40-fold higher 
than the Ka, thus ensuring stoichiometric binding. The transition from high- to low-affinity binding resulted in an inflection point of ~1 uM XylR in 
the presence of 1 uM IF DNA. This indicated a binding stoichiometry of one XylR dimer to two DNA duplexes. (E) Model of XylR-operator DNA 
based on the stoichiometry study in (D) showing two duplexes binding a dimer. 



2006 Nucleic Acids Research, 2013, Vol. 41, No. 3 



A IA promoter site 




IF promoter site 


II II II II h — ■ 1 1 1 1 1 1 1 1 1 1 1 1 1 H 1 1 1 1 1 1 1 1 1 






■ i 


1 1 

74 bp 42 bp 


111 bp 42 bp 102 bp 


B one site binding model 




two site binding model 


promoter site \ J 


c UUXylR V_ 




D djjm 


•'' XylR 






DNA 






\;1 







E IF promoter site IF promoter site 



TTTTTTTT 









1 


1 


1 1 


1 1 


1 f 


1 



200 bp 42 bp 500 bp 42 bp 200 bp 
F G 




Figure 5. The binding mode of XylR to the xyl promoter region observed by atomic force microscopy. (A) Schematic of the xyl promoter region, 
which contain two XylR binding sites (IA and IF). The arrow indicates the 5' to 3' direction of the XylR-binding site. (B) A cartoon representation 
of the two observed modes of XylR-DNA binding. The AFM data show that a XylR dimer binds first to one DNA site and, subsequently, the 
second site on the same DNA strand, looping the DNA. (C) AFM images of a XylR dimer bound to a single promoter site of the xyl promoter 
region. (D) A XylR dimer binding to both promoter sites creating a DNA loop. However, the intervening DNA between operator sites was too short 
to readily visualize via AFM using this DNA site. See Figure 5G. (E) Unnatural DNA substrate used to visualize DNA loop more clearly. (F) AFM 
images of a XylR dimer bound to a single promoter site of the longer DNA substrate shown in E. (G) AFM images of a XylR dimer binding to the 
two promoter sites, leading to clear DNA looping. 
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Indeed, the initial studies did not ascertain the binding 
affinity of XylR for these sites nor did they deduce 
how many XylR molecules interact at each motif. 
Understanding how XylR regulates the two operons 
requires knowledge of the binding stoichiometry of XylR 
for each site, as the structures show that each XylR dimer 
contains four HTH motifs. Hence, four XylR molecules 
could potentially bind this region whereby each HTH 
interacts with one direct repeat. Thus, we performed FP 
experiments to determine the DNA-binding affinities and 
stoichiometry of XylR for its DNA site. These studies 
showed that D-xylose was required to achieve high 
affinity binding to the IA and IF operator sites (Figure 
4B-C). In the presence of 2mM D-xylose, XylR binds its 
operator sites with K d s of 25 nM and 33 nM. By contrast, 
in the absence of D-xylose, saturable binding was not 
observed. Strikingly, the data show that a XylR dimer 
binds two DNA duplexes or promoter sites (Figure 4D). 
Thus, to generate a XylR-DNA model, we docked two 
DNA duplexes onto each XylR subunit using the 
MarA-DNA structure as a guide (Figure 4E). 

The fact that the two operons regulated by XylR are 
transcribed in opposite orientation with the transcription 
start sites of the first transcribed genes and the finding that 
one XlyR dimer binds two separate DNA sites suggested 
the intriguing possibility that one XylR dimer may 
interact with both IA and IF via a looping mechanism. 
To test this we performed AFM experiments. We first 
looked at XylR binding to the natural promoter region 
encompassing both XylR operator sites. Consistent with 
stoichiometry studies, these analyses (Figures 5A-D) 
strongly suggested that one XylR dimer binds between 
the promoter operator sites and mediates looping. 
However, the intervening DNA between operator sites 
was too short to readily visualize via AFM (Figure 5D). 
Thus, to clearly deduce whether one XylR dimer can bind 
between distant DNA sites, we constructed IF900, which 
encodes two IF promoter sites separated by 500 bp. When 
mixed with this construct, AFM studies revealed clear 
evidence for DNA looping by XylR (Figures 5E-G). 

The -35 and -10 regions of both xyl promoters possess 
poor matches to the optimal ct70 consensus motifs, suggest- 
ing that XylR-D-xylose likely activates transcription by re- 
cruitment of RNA polymerase. DNA looping by XylR 
could be an integral mechanism by which XylR performs 
its transcription activation function, as it would closely 
juxtapose the two promoter sites perhaps allowing RNA 
Polymerase recruitment to both sites. This activation 
looping contrasts with the repressive outcome of DNA 
looping by AraC, which in its apo state, loops DNA and 
inhibits Pbad transcription (36,37). Transcription activation 
by AraC occurs when it binds L-arabinose and, in collabor- 
ation with CRP, stimulates loop opening (37). By contrast, 
apo XylR does not bind DNA. However, a repressive mode 
of XylR is not required as AraC-L-arabinose binds the xyl 
promoters mediating repression (5). Thus, the combined 
data indicate that under L-arabinose and D-xylose replete 
conditions, AraC-L-arabinose binds tightly to the xyl pro- 
moters and only when L-arabinose is depleted does D-xylose 
bound XylR bind and loop DNA to activate transcription. 



In conclusion, our studies on XylR define a new 
family of DNA-binding proteins, which harbours a 
DNA-binding domain with an AraC-like fold and a 
ligand-binding domain with a LacI/GalR-like structure. 
The ligand-binding domain dimerizes in a distinct antipar- 
allel mode. D-xylose binding causes a helix to strand tran- 
sition in the dimer interface of the D-xylose-binding 
domain that results in dimer rearrangement, which is 
transmitted to the DNA-binding domains. XylR binds 
to two promoters that are transcribed in opposite direc- 
tions. Strikingly, FP and AFM studies indicate that XylR 
binds to two DNA sites per dimer and loops DNA. 
Finally, the XylR-D-xylose structure reveals key determin- 
ants that explain its exquisite selectivity towards D-xylose. 
This knowledge could be used in design efforts towards 
the development of more efficient E. coli biocatalysts. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Figures 1-9. 
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